Environment Setup¶

In [1]:
# General Imports
import numpy as np
import pandas as pd
import geopandas as gpd
import os
import glob
import folium
import json
import branca.colormap as cmp
import plotly.graph_objects as go
from datetime import datetime
from copy import deepcopy
In [2]:
import plotly.offline as pyo
pyo.init_notebook_mode()

Data Loading¶

In [3]:
history_data = pd.read_csv('data/EV_Data/Electric_Vehicle_Population_Size_History_By_County.csv')
history_data.head()
Out[3]:
Date County State Vehicle Primary Use Battery Electric Vehicles (BEVs) Plug-In Hybrid Electric Vehicles (PHEVs) Electric Vehicle (EV) Total Non-Electric Vehicle Total Total Vehicles Percent Electric Vehicles
0 October 31 2019 Flathead MT Passenger 1 0 1 70 71 1.41
1 October 31 2019 New London CT Passenger 0 2 2 244 246 0.81
2 June 30 2019 Pend Oreille WA Truck 0 0 0 5730 5730 0.00
3 November 30 2019 Virginia Beach VA Passenger 1 0 1 706 707 0.14
4 February 28 2018 Sacramento CA Passenger 1 0 1 307 308 0.32
In [4]:
vehicle_data = pd.read_csv('data/EV_Data/Electric_Vehicle_Population_Data.csv')
vehicle_data.head()
Out[4]:
VIN (1-10) County City State Postal Code Model Year Make Model Electric Vehicle Type Clean Alternative Fuel Vehicle (CAFV) Eligibility Electric Range Base MSRP Legislative District DOL Vehicle ID Vehicle Location Electric Utility 2020 Census Tract
0 5YJYGDEF5L Thurston Lacey WA 98516.0 2020 TESLA MODEL Y Battery Electric Vehicle (BEV) Clean Alternative Fuel Vehicle Eligible 291 0 22.0 124535071 POINT (-122.7474291 47.0821119) PUGET SOUND ENERGY INC 5.306701e+10
1 1N4BZ1CP1K King Sammamish WA 98074.0 2019 NISSAN LEAF Battery Electric Vehicle (BEV) Clean Alternative Fuel Vehicle Eligible 150 0 45.0 102359449 POINT (-122.0313266 47.6285782) PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 5.303303e+10
2 5YJXCDE28G King Kent WA 98031.0 2016 TESLA MODEL X Battery Electric Vehicle (BEV) Clean Alternative Fuel Vehicle Eligible 200 0 33.0 228682037 POINT (-122.2012521 47.3931814) PUGET SOUND ENERGY INC||CITY OF TACOMA - (WA) 5.303303e+10
3 JHMZC5F37M Kitsap Poulsbo WA 98370.0 2021 HONDA CLARITY Plug-in Hybrid Electric Vehicle (PHEV) Clean Alternative Fuel Vehicle Eligible 47 0 23.0 171566447 POINT (-122.64177 47.737525) PUGET SOUND ENERGY INC 5.303509e+10
4 WA1F2AFY4P Thurston Olympia WA 98501.0 2023 AUDI Q5 E Plug-in Hybrid Electric Vehicle (PHEV) Not eligible due to low battery range 23 0 22.0 234923230 POINT (-122.89692 47.043535) PUGET SOUND ENERGY INC 5.306701e+10
In [5]:
us_outlines = {}

# Read in all geo
for geo_json in glob.glob('data/Geo_Data/*.json'):
    with open(geo_json, 'r', encoding='utf-8', errors='ignore') as geo_data:
        us_outlines[os.path.basename(geo_json)[:-5]] = (json.load(geo_data))

Preprocessing and Data Extraction¶

To start, we must drop any rows with missing (NaN) data.

In [6]:
# Drop nan's
history_data.dropna(inplace=True)
print(f'Num of history_data NaNs: %d' % sum([history_data[col].isna().sum() for col in history_data.columns]))

vehicle_data.dropna(inplace=True)
print(f'Num of vehicle_data NaNs: %d' % sum([vehicle_data[col].isna().sum() for col in vehicle_data.columns]))
Num of history_data NaNs: 0
Num of vehicle_data NaNs: 0

The GeoJson datasets contain geo data on all relevant locations in the U.S. We must extract the counties specific to Washington for our analysis. The GeoJson for the counties uses the state number to represent the states, but our main datasets don't use this. We will need to extract this value from the states GeoJson, then use this id to extract only the counties in Washington.

In [7]:
# Get Washington's state id
for state in us_outlines['states']['features']:
    if state['properties']['NAME'] == 'Washington':
        wa_id = state['properties']['STATE']
        print(f'Washington State ID: {wa_id}')
        break
Washington State ID: 53
In [8]:
# Get geo data of all counties in washington
wa_counties = []
for county in us_outlines['counties']['features']:
    if county['properties']['STATE'] == '53':
        wa_counties.append(county)
wa_county_outlines = deepcopy(us_outlines['counties'])
wa_county_outlines['features'] = wa_counties
print(f'Number of counties: %d' % len(wa_counties))
Number of counties: 39

Each row of vehicle_data is an individual vehicle with information on where it resides. A new dataframe must be made to hold the numerical counts of EVs in general per county to use in our choropleths

In [9]:
# Get count of ev's in each county
total_ev_per_county = pd.DataFrame(vehicle_data['County'].value_counts())
total_ev_per_county.rename(columns={'County': 'Count'}, inplace=True)
total_ev_per_county.head()
Out[9]:
Count
King 84940
Snohomish 19012
Pierce 12575
Clark 9595
Thurston 5880

We will transform the date in history_data into DateTime to extract date/time parts more easily.

In [10]:
import calendar

# Get month name to number mapper
month_mapper = {month_name: month_num for month_num, month_name in enumerate(calendar.month_name)}
month_mapper.pop('')
month_mapper

# Transform date string (if it is a string) to datetime object
if pd.api.types.is_string_dtype(history_data['Date'].dtype):
    history_data['Date'] = history_data['Date'].str.split().apply(lambda x: datetime.strptime(
                                                                    '/'.join([str(month_mapper[x[0]])] + x[1:]),
                                                                    '%m/%d/%Y'
                                                                ))

Analysis¶

The data for each county can be assumed to be approximately the population data as the dataset source is described to show all the EVs "that are currently registered through Washington State Department of Licensing (DOL)".

In [11]:
# Get year and count number of vehicles registered **per** year
register_per_year = dict(history_data['Date'].dt.year.value_counts())
register_per_year = {year: register_per_year[year] for year in sorted(register_per_year)}

# Compute cumulative sum of vehicle registered in each year
cum_year_count = {}
cumsum = 0
for year in register_per_year:
    cum_year_count[year] = cumsum + register_per_year[year]
    cumsum = cum_year_count[year]

annual_ownership_scatter = go.Figure(data=[go.Scatter(x=list(cum_year_count.keys()), 
                                                      y=list(cum_year_count.values()))])
annual_ownership_scatter.show()
In [12]:
# Compute change in count of vehicle registration per year
annual_registration_change = {}
prev_year = ('0000', 0)
for year in register_per_year:
    annual_registration_change[f'{prev_year[0]} to {year}'] = register_per_year[year] - prev_year[1]
    prev_year = (year, register_per_year[year])

# Plot change in a bar plot, omitting first change (no change before 2017)
annual_reg_change_hist = go.Figure()
annual_reg_change_hist.add_trace(go.Bar(
        x=list(annual_registration_change.keys())[1:],
        y=list(annual_registration_change.values())[1:],
        text=list(annual_registration_change.values())[1:],
        textposition='auto',
))

# Change the bar mode to have negative change under 0
annual_reg_change_hist.update_layout(barmode='relative', title_text='Change in # of Registration of EVs Between Years')
annual_reg_change_hist.show()

Total EVs owned per year is more or less consistent from 2017 to 2023. At first glance, the scatter plot provides the false implication that a simple linear regression is a safe option to predict predict the likely number of EVs in the coming years. However, if we look at the change in EV registration rate per year, we do actually see a drastic dip going into 2023. In fact, there is negative trend in the rate at which we are registering new EVs. From this, it may actually be safer to apply a logarithmic regression. However, it's possible we don't have the rest of the december data yet if we analyze the trends per month, we can adjust the 2022 to 2023 rate with predicted amount of EVs to register for december, then make a more accurate prediction of the trend. We can evaluate MAPE and use the MAPE to evaluate We'll do "years since 2017" for our predictor value

In [13]:
# Plot months in all years on same plot

# Compute mean and variance per year

# Make claim on mean and variance with certain confidence

Maps¶

In [14]:
def compute_county_center(coords):
    '''
    Compute the center of each county given the list of coords by getting the max and min of the x or y coords
    for the county and dividing by 2 to get the center of the min and max of the x or y.
    '''
    def min_max_county_coords(sum_coords, test = False):
        min_x = np.inf
        max_x = -np.inf
        min_y = np.inf
        max_y = -np.inf
        
        for coord in sum_coords:
            if isinstance(coord[0], list):
                mm_coords = min_max_county_coords(coord, True) # recursively get min_max of nested coord lists(due to islands)
                    
                min_x = min(min_x, mm_coords[0][0])
                max_x = max(max_x, mm_coords[0][1])
                min_y = min(min_y, mm_coords[1][0])
                max_y = max(max_y, mm_coords[1][1])
            else:
                min_x = min(min_x, coord[0])
                max_x = max(max_x, coord[0])
                min_y = min(min_y, coord[1])
                max_y = max(max_y, coord[1])
            
        return [[min_x, max_x], [min_y, max_y]]
    assert isinstance(coords, list)
    
    mm_coords = min_max_county_coords(coords)
        
    return [sum(mm_coords[0]) / 2, sum(mm_coords[1]) / 2]
In [15]:
# Linear color map function for chropleth to plot continuous heatmap coloring.
linear_color = cmp.LinearColormap(
    ['#F2FFDB', '#8B0000'],
    vmin=min(total_ev_per_county['Count']), vmax=max(total_ev_per_county['Count']),
    caption='Total # of EVs in 2023'
)

# Create choropleth plot of EV Ownership in Seattle per county
ev_own_choro = folium.Map([47.751076, -120.740135], zoom_start=6.5)
wa_choro = folium.features.GeoJson(
    wa_county_outlines,
    style_function=lambda county: {
        'fillColor': linear_color(total_ev_per_county.loc[county['properties']['NAME']]['Count']),
        'fillOpacity': 0.9,
    },
    highlight_function=lambda county: {
        'fillColor': '#000000', 
        'fillOpacity': 0.20
    },
    tooltip=folium.features.GeoJsonTooltip(fields=['NAME'], aliases=['County'])
)

# Add choropleth properties
ev_own_choro.add_child(wa_choro)
ev_own_choro.add_child(linear_color)

# Add count markers on each county
for county in wa_county_outlines['features']:
    center_coord = compute_county_center(county['geometry']['coordinates'])
    county_name = county['properties']['NAME']
    county_ev_count = total_ev_per_county.loc[county_name]['Count']
    
    folium.Marker(location=[center_coord[1], center_coord[0]], icon=folium.DivIcon(
                      f"<div style='font-size: 15pt'>{county_ev_count}</div>")
                 ).add_to(ev_own_choro)

ev_own_choro
Out[15]:
Make this Notebook Trusted to load map: File -> Trust Notebook

TODO: MAY REMOVE¶

The datasets use abbreviations for the, so we'll need to convert the abbreviation to the actual names for folium/geopandas. Once they've been converted, all rows with any NaN will be dropped as they will affect the visualization and predictive models.

In [16]:
state_abbrev_raw = pd.read_csv('data/states_abbrev.csv')
state_abbrev_mapper = dict(zip(state_abbrev_raw['Abbreviation'], state_abbrev_raw['State']))
history_data['State'] = history_data['State'].replace(state_abbrev_mapper)
vehicle_data['State'] = vehicle_data['State'].replace(state_abbrev_mapper)

The GeoJson dataset is a simple dictionary. We will need to convert to a geopandas dataframe. Since we're only interested in the geo data, we can drop all columns except the County name and the geometry data.

In [17]:
wa_counties_gdf = gpd.GeoDataFrame.from_features(wa_county_outlines).drop(columns=['GEO_ID',
                                                                                         'LSAD',
                                                                                         'CENSUSAREA',
                                                                                         'COUNTY',
                                                                                         'STATE'])
wa_counties_gdf.rename(columns={'NAME': 'County'}, inplace=True)
wa_counties_gdf.head()
Out[17]:
geometry County
0 POLYGON ((-118.04115 46.77602, -118.04117 46.7... Adams
1 POLYGON ((-116.91750 45.99547, -116.94068 45.9... Asotin
2 POLYGON ((-119.29769 45.93574, -119.29879 45.9... Benton
3 POLYGON ((-121.15190 47.86676, -121.15251 47.8... Chelan
4 POLYGON ((-123.78127 48.15561, -123.77947 48.1... Clallam